Empirical analysis of Zipf’s law, power law, and lognormal distributions in medical discharge reports
نویسندگان
چکیده
Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. This paper empirically analyses whether in medical discharge reports follow Zipf’s law, a commonly assumed property of language where word frequency follows discrete power-law distribution. We examined 20,000 from the MIMIC-III dataset. Methods included splitting into tokens, counting token frequency, fitting distributions data, testing alternative distributions—lognormal, exponential, stretched truncated power-law—provided superior fits data. Discharge are best fit by lognormal distributions. appear be near-Zipfian having provide over pure power-law. Our findings suggest that report would benefit using non-parametric models capture behavior.
منابع مشابه
Mobile Phone Social Networks: Beyond Power-Law and Lognormal Distributions
We analyze a massive social network gathered from a large mobile phone operator’s records, comprised of millions of users and tens of millions of calls. We examine the following questions: what is the distribution of phone calls per customer; total talk time per customer; and distinct partners per customer? We find that these distributions are skewed, and that they significantly deviate from wh...
متن کاملPower-Law Distributions in Empirical Data
Aaron Clauset, 2 Cosma Rohilla Shalizi, and M. E. J. Newman Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48...
متن کاملPower - Law Distributions in Empirical Data ∗ Aaron
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty ...
متن کاملSampling power-law distributions
Power-law distributions describe many phenomena related to rock fracture. Data collected to measure the parameters of such distributions only represent samples from some underlying population. Without proper consideration of the scale and size limitations of such data, estimates of the population parameters, particularly the exponent D, are likely to be biased. A Monte Carlo simulation of the s...
متن کاملA Brief History of Generative Models for Power Law and Lognormal Distributions Draft Manuscript
Power law distributions are an increasingly common model for computer science applications; for example, they have been used to describe file size distributions and inand out-degree distributions for the Web and Internet graphs. Recently, the similar lognormal distribution has also been suggested as an appropriate alternative model for file size distributions. In this paper, we briefly survey s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Medical Informatics
سال: 2021
ISSN: ['1386-5056', '1872-8243']
DOI: https://doi.org/10.1016/j.ijmedinf.2020.104324